Skip to content

fix(web): add bestEffort mode to captureError for daemon transient errors#33160

Merged
vex-assistant-bot[bot] merged 1 commit into
mainfrom
devin/1780443059-lum-2189-2193-best-effort-capture-error
Jun 2, 2026
Merged

fix(web): add bestEffort mode to captureError for daemon transient errors#33160
vex-assistant-bot[bot] merged 1 commit into
mainfrom
devin/1780443059-lum-2189-2193-best-effort-capture-error

Conversation

@devin-ai-integration

Copy link
Copy Markdown
Contributor

Prompt / plan

Closes LUM-2189 — transient error resilience for refreshConversationRow
Closes LUM-2193captureError lacks best-effort mode for expected daemon transient errors

Supersedes #33151 (which used empty catches instead of the bestEffort approach).

Problem

captureError() had a binary gap between two error categories:

  1. Browser-level TypeError: Failed to fetch — silenced via isTransientNetworkError
  2. Everything else — reported to Sentry, including daemon startup errors (503 "still starting up", 502 bad-gateway, 401 auth-race, 400 org-header-not-ready) that are expected transient states, not bugs

This forced every best-effort background fetch to choose between captureError (over-reports expected startup noise → WEB-20) or empty .catch(() => {}) (silences real bugs like 500s and data-integrity errors). LUM-2114, LUM-2188, LUM-2189 each independently chose empty catches.

Changes

isExpectedDaemonTransientError() — new helper in capture-error.ts that identifies expected transient HTTP errors from daemon API calls:

  • ApiError(503) — daemon still booting
  • ApiError(502) — reverse proxy can't reach daemon pod
  • ApiError(401) — auth session race during login
  • ApiError(400) with "Organization-Id header" — org store not yet hydrated

Only ApiError instances match. TypeError, generic Error, and plain objects pass through.

captureError({ bestEffort: true }) — when the flag is set, isExpectedDaemonTransientError errors are silently dropped in addition to transient network errors. Unexpected errors (500, data-integrity, programming errors) still get reported. Without the flag, behavior is unchanged.

Call-site updates:

  • useActiveConversation — adds useIsOrgReady() gate (prevents fetch before org header is available) + captureError({ bestEffort: true }) (silences expected transients, reports real bugs)
  • useConversationSync.refreshRowcaptureError({ bestEffort: true })

CONVENTIONS.md — documents the bestEffort option in the "Manual error reporting" section.

Root cause analysis

  1. How did the code get into this state? captureError was designed for the browser-level network error case (PR that introduced isTransientNetworkError). HTTP error responses from the daemon (503/502/401/400) are valid responses, not network errors, so they weren't filtered. Each new background-fetch call site had to independently decide how to handle them.

  2. What led to the gap? The original captureError implementation assumed errors were either browser network failures (transient) or application errors (report). It didn't model the middle ground: HTTP responses that are expected during daemon startup but not application bugs.

  3. Warning signs? Multiple call sites (LUM-2114, LUM-2188) choosing empty catches over captureError was the signal that the abstraction had a gap. Each empty catch was a local workaround for the missing bestEffort mode.

  4. Prevention? The bestEffort option makes the correct default easy — future call sites use captureError({ bestEffort: true }) and get the right filtering without reinventing error-handling decisions.

Alternatives considered

  • Empty catches (what fix(web): silence transient errors from best-effort conversation row fetches #33151 did) — silences everything including real daemon bugs (500) and data-integrity errors. Rejected because it trades Sentry noise for silent bug masking.
  • Expand isTransientNetworkError to cover HTTP errors — semantically wrong. 503/502/401 are not network errors; they're valid HTTP responses. Conflating them would make the function name misleading and could affect other call sites that only want to filter network-level failures.
  • Per-call-site status-code allowlists — each caller checks error instanceof ApiError && [503, 502, ...].includes(error.status). Rejected because it duplicates the same logic at every call site and is error-prone (easy to forget a status code). The centralized isExpectedDaemonTransientError is the DRY version.

Test plan

  • bun test src/lib/sentry/capture-error.test.ts — 22 tests pass: isExpectedDaemonTransientError for all status codes, captureError with/without bestEffort flag, existing normalizeToError and captureError tests unchanged
  • bun test src/domains/chat/hooks/use-active-conversation.test.tsx — 6 tests pass: existing behavior + new "does not fetch when org is not ready" test
  • bunx tsc --noEmit — clean
  • bun run lint — no new warnings

Link to Devin session: https://app.devin.ai/sessions/c2e17ff1867f4ebd90aac007ea0f5453
Requested by: @ashleeradka

…rors

captureError previously had a binary gap: it silenced browser-level
TypeError: Failed to fetch (via isTransientNetworkError) but treated
every HTTP error response identically. This forced best-effort
background fetches to choose between captureError (over-reporting
expected daemon startup errors) and empty catches (silencing real bugs).

Add isExpectedDaemonTransientError() and a bestEffort option to
captureError. When bestEffort is true, expected daemon transient HTTP
errors (503/502/401/400-org-header) are silently dropped while
unexpected errors (500, data integrity, programming errors) still get
reported to Sentry.

Update refreshConversationRow call sites in useActiveConversation and
useConversationSync to use captureError({ bestEffort: true }) instead
of either bare captureError (which over-reports) or empty catches
(which under-reports).

Also add org-readiness gating to useActiveConversation to prevent
fetches from firing before the Vellum-Organization-Id header is
available.

Closes LUM-2189
Closes LUM-2193

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@linear

linear Bot commented Jun 2, 2026

Copy link
Copy Markdown

LUM-2189

LUM-2193

@devin-ai-integration

Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment, CI, and merge conflict monitoring

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

@vex-assistant-bot vex-assistant-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✦ Vex Review: Approved

Clean, defensible widening of . The flag is opt-in and additive — existing callers unchanged.

** shape:**

  • Only matches (browser-level /network errors continue to flow through )
  • 503/502/401 by status alone — safe; these statuses on a daemon API call don't have other plausible meanings
  • 400 narrowed by — brittle on message text, but documented in comment and the alternative (matching all 400s as transient) would mask real client bugs. Worth keeping an eye on if the daemon error message ever changes.

Call-site fixes:

  • : real fix is the new gate — same hydration-race pattern as #32912. Wrapping captureError with is defense-in-depth for the residual 502/503/auth-race windows.
  • : trading for does lose the warning-tier surfaceability for these errors entirely; that's intentional — the whole point is to stop them from appearing in Sentry when transient.

Anti-pattern grep on the diff: clean. Zero casts, zero non-null , zero bare . Tests cover 7 predicate cases (positive + negative) plus 3 captureError integration cases including the "still report 500 under bestEffort" guard rail.

Closes LUM-2189 + LUM-2193. Supersedes #33151 cleanly.

CI 7/7 green at HEAD.

@vex-assistant-bot vex-assistant-bot Bot merged commit bb54683 into main Jun 2, 2026
7 checks passed
@vex-assistant-bot vex-assistant-bot Bot deleted the devin/1780443059-lum-2189-2193-best-effort-capture-error branch June 2, 2026 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant